AITopics | triangle inequality

Collaborating Authors

triangle inequality

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Markov Chain Approach to Preference Alignment

Koriyama, Takuya, Liang, Tengyuan

arXiv.org Machine LearningJun-23-2026

We propose Markov Chain from Human Feedback (MCHF), an elementary approach for aligning generative models from pairwise human preferences. Unlike Reinforcement Learning from Human Feedback (RLHF), which reduces comparisons to a scalar reward, and Nash Learning from Human Feedback (NLHF), which preserves pairwise utilities through a KL-regularized minimax optimization, MCHF uses pairwise preferences directly to define a transition mechanism over model outputs. Given a pairwise utility $U(x,y)$, which quantifies human preference for $y$ over $x$, and a reference probability distribution $μ_{\mathsf{ref}}$, we define a Markov kernel $\mathsf{P}(x, dy)\propto \exp(U(x,y))μ_{\mathsf{ref}}(dy)$, and take the Markov chain starting from $μ_{\mathsf{ref}}$ as an iterative alignment procedure. We show that MCHF converges geometrically fast to the stationary distribution, with a convergence rate governed by the seminorm $\|U\|_\oplus=\inf_{g,f\in L^\infty(μ_{\mathsf{ref}})}\|U-g\oplus f\|_\infty$, which quantifies the non-transitive structure of the pairwise utility. We further show that a mirror-descent algorithm for NLHF satisfies an analogous structure-adaptive convergence guarantee. Finally, through a perturbation analysis, we prove that when $\|U\|_\oplus$ is small, MCHF and NLHF agree up to first order around an RLHF solution, which yields a unified view of reward-based, game-theoretic, and Markovian approaches to alignment. In particular, for two natural algorithms that converge to the MCHF/NLHF equilibria, we show that the first step of MCHF and NLHF recovers the RLHF solution based on the column-sum reward $\hat{f}(y)=\int μ_{\mathsf{ref}}(dx) U(x, y)$, and starting from the second iteration, both algorithms incorporate the same linear functional of the residual $U-(-\hat f)\oplus \hat f$, which captures the non-transitive structure of the pairwise utility $U$.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Machine Learning

2606.22652

Country: North America > United States > New York (0.28)

Genre:

Research Report (0.50)
Workflow (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AGeneralized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Neural Information Processing SystemsJun-16-2026, 22:27:45 GMT

The bisimulation metric (BSM) is a powerful tool for computing state similarities within a Markov decision process (MDP), revealing that states closer in BSM have more similar optimal value functions. While BSM has been successfully utilized in reinforcement learning (RL) for tasks like state representation learning and policy exploration, its application to multiple-MDP scenarios, such as policy transfer, remains challenging. Prior work has attempted to generalize BSM to pairs of MDPs, but a lack of rigorous analysis of its mathematical properties has limited further theoretical progress. In this work, we formally establish a generalized bisimulation metric (GBSM) between pairs of MDPs, which is rigorously proven with the three fundamental properties: GBSM symmetry, inter-MDP triangle inequality, and the distance bound on identical state spaces. Leveraging these properties, we theoretically analyse policy transfer, state aggregation, and sampling-based estimation in MDPs, obtaining explicit bounds that are strictly tighter than those derived from the standard BSM. Additionally, GBSM provides a closed-form sample complexity for estimation, improving upon existing asymptotic results based on BSM. Numerical results validate our theoretical findings and demonstrate the effectiveness of GBSM in multi-MDP scenarios.

artificial intelligence, machine learning, mdp, (16 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Add feedback

Tight Bounds On The Distortion of Randomized and Deterministic Distributed Voting

Neural Information Processing SystemsJun-15-2026, 22:42:43 GMT

We study metric distortion in distributed voting, where nvoters are partitioned into k groups, each selecting a local representative, and a final winner is chosen from these representatives (or from the entire set of candidates). This setting models systems like U.S. presidential elections, where state-level decisions determine the national outcome. We focus on four cost objectives from Anshelevich et al. [1]: avg-avg, avg-max, max-avg, and max-max. We present improved distortion bounds for both deterministic and randomized mechanisms, offering a near-complete characterization of distortion in this model. For deterministic mechanisms, we reduce the upper bound for avg-max from 11 to 7, establish a tight lower bound of 5 for max-avg (improving on 2+ 5), and tighten the upper bound for max-max from 5 to 3. For randomized mechanisms, we consider two settings: (i) only the second stage is randomized, and (ii) both stages may be randomized. In case (i), we prove tight bounds: 5 2/k for avg-avg, 3for avg-max and max-max, and 5for max-avg. In case (ii), we show tight bounds of 3 for max-avg and max-max, and nearly tight bounds for avg-avg and avg-max within [3 2/n, 3 2/(kn)]and [3 2/n, 3], respectively, where n denotes the largest group size.

artificial intelligence, distortion, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Government > Voting & Elections (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

Efficient k-Sparse Band-Limited Interpolation with Improved Approximation Ratio

Neural Information Processing SystemsJun-14-2026, 18:31:44 GMT

We consider the task of interpolating a k-sparse band-limited signal from a small collection of noisy time-domain samples. Exploiting a new analytic framework for hierarchical frequency decomposition that performs systematic noise cancellation, we give the first polynomial-time algorithm with a provable (3+ 2+ε)approximation guarantee for continuous interpolation. Our method breaks the long-standing C > 100 barrier set by the best previous algorithms, sharply reducing the gap to optimal recovery and establishing a new state of the art for high-accuracy band-limited interpolation. We also give a refined "shrinking-range" variant that achieves a ( 2+ε+c)-approximation on any sub-interval (1 c)T for some c (0,1), which gives even higher interpolation accuracy.

artificial intelligence, data quality, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.67)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Data Science > Data Quality > Data Transformation (0.46)

Add feedback

CP-factorization for high dimensional tensor time series and double projection iterations

Chang, Jinyuan, Huang, Guanglin, Yao, Qiwei, Yu, Long

arXiv.org Machine LearningJun-9-2026

We adopt the canonical polyadic (CP) decomposition to model high-dimensional tensor time series. Our primary goal is to identify and estimate the factor loadings in the CP decomposition. We propose a one-pass estimation procedure through standard eigen-analysis for a matrix constructed based on the serial dependence structure of the data. The asymptotic properties of the proposed estimator are established under a general setting as long as the factor loading vectors are linearly independent, allowing the factors to be correlated and the factor loading vectors to be not nearly orthogonal. The procedure adapts to the sparsity of the factor loading vectors, accommodates weak factors, and demonstrates strong performance across a wide range of scenarios. To further reduce estimation errors, we also introduce an iterative algorithm based on a novel double projection approach. We theoretically justify the improved convergence rate of the iterative estimator, and derive the associated limiting distribution. A consistent estimator of the asymptotic variance is also provided, which plays a key role in the related inference problems. All results are validated through extensive simulations and two real data applications.

artificial intelligence, estimator, machine learning, (14 more...)

arXiv.org Machine Learning

2606.0856

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Implicit Regularization in Perturbed Deep Matrix Factorization: Spectral Conditions and Stability

Wang, Jingzhe, Chou, Hung-Hsu

arXiv.org Machine LearningMay-28-2026

This paper studies the stability of low-rank implicit regularization in perturbed deep matrix factorization, where the target matrix is corrupted by a noise matrix. We first derive sufficient spectral conditions under which gradient descent exhibits a low-rank phase in the noiseless setting. These conditions show how the target spectrum, initialization, and step size jointly determine the existence of a nonempty low-rank interval. We then analyze the perturbed gradient descent dynamics, proving convergence guarantees and quantifying how the perturbation affects iteration complexity and eigenvalue recovery. Finally, we show that the low-rank phase persists under perturbation, with explicit dependence on the perturbation size. Numerical experiments support the theoretical findings.

artificial intelligence, gradient descent, machine learning, (13 more...)

arXiv.org Machine Learning

2605.28613

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)

Add feedback

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

Jacobs, Peter Matthew, Phillips, Jeff M.

arXiv.org Machine LearningMay-20-2026

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $ε$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $α$-Hölder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $ε$ error in $ε^{-\max(2,\frac{d+1+o(1)}{1+α})}$ time for $0 < α< 1$ Hölder smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $Θ(ε^{-2})$ for $α> 1/2$ when $d=2$ and nearly optimal as $α\to 1$ when $d = 3$.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2605.20122

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Multiscale Euclidean Network Trajectories: Second-Moment Geometry, Attribution, and Change Points

Ezoe, Haruka, Hisano, Ryohei

arXiv.org Machine LearningMay-12-2026

A central challenge in dynamic network analysis is to represent temporal evolution in a way that is both geometrically meaningful and statistically identifiable. One approach embeds a sequence of network snapshots as trajectories in a Euclidean space and relates these trajectories to node embeddings. In multilayer and unfolded spectral constructions, however, node embeddings and their underlying latent positions are identifiable only up to general linear transformations. Although this ambiguity preserves edge probabilities, it can distort geometry and invalidate distance based temporal comparisons at both the trajectory and node-levels. We develop Multiscale Euclidean Network Trajectories (MENT), a framework for multiscale temporal trajectories based on second-moment geometry. By imposing an isotropic normalization on the anchor latent positions, we reduce the relevant ambiguity to orthogonal transformations and prevent distortion of the second-moment geometry. In this canonical representation, we define a trace variation distance and mode-wise variation distances along orthogonal directions, and use multidimensional scaling to obtain low-dimensional trajectories of time points at both global and mode-wise levels. The resulting trajectories support interpretation and inference. They admit mode-wise decompositions, support attribution of global and mode-wise temporal changes to nodes, and enable change point detection through 1D trajectories. We prove consistency of the proposed unfolded spectral embedding and of the induced temporal trajectories. Experiments on two synthetic and two real dynamic networks illustrate stable and interpretable recovery of temporal structure and show strong performance against existing change point detection baselines.

data mining, machine learning, trajectory, (20 more...)

arXiv.org Machine Learning

2605.04589

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Energy (1.00)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adaptive Estimation and Inference in Semi-parametric Heterogeneous Clustered Multitask Learning via Neyman Orthogonality

Chen, Hanxiao, Mukherjee, Debarghya

arXiv.org Machine LearningMay-5-2026

We study clustered multitask learning in a semiparametric setting where tasks share a latent cluster structure in their target parameters but exhibit heterogeneous, potentially infinite-dimensional nuisance components. Such heterogeneity poses a major challenge for existing multitask learning methods, which typically rely on aligned feature spaces or homogeneous task structures. To address this challenge, we propose an adaptive fused orthogonal estimator that integrates Neyman-orthogonal losses with data-driven pairwise fusion penalties. Our framework leverages task-specific pilot estimates to calibrate the fusion penalties and combines adaptive aggregation with orthogonalization to mitigate the impact of nuisance-parameter estimation error. Theoretically, we show that the proposed estimator achieves exact recovery of the latent clustering with high probability and attains pooled parametric convergence rates proportional to cluster size. Moreover, we establish asymptotic normality and show that, asymptotically, our estimator matches the performance of an oracle procedure that knows the true clustering in advance. Empirically, we show that the proposed method consistently outperforms strong baselines in various simulation setups. A real-world application to U.S. residential energy consumption demonstrates the effectiveness of our approach in uncovering meaningful regional clustering in electricity price elasticity, showcasing the efficacy of our method.

artificial intelligence, machine learning, semi-parametric heterogeneous clustered multitask learning, (14 more...)

arXiv.org Machine Learning

2605.01907

Country: North America > United States (1.00)

Genre: Research Report (0.81)

Industry: Energy > Power Industry (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Filters

Collaborating Authors

triangle inequality

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A Markov Chain Approach to Preference Alignment

AGeneralized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Tight Bounds On The Distortion of Randomized and Deterministic Distributed Voting

Efficient k-Sparse Band-Limited Interpolation with Improved Approximation Ratio

CP-factorization for high dimensional tensor time series and double projection iterations

Implicit Regularization in Perturbed Deep Matrix Factorization: Spectral Conditions and Stability

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

Multiscale Euclidean Network Trajectories: Second-Moment Geometry, Attribution, and Change Points

Adaptive Estimation and Inference in Semi-parametric Heterogeneous Clustered Multitask Learning via Neyman Orthogonality

e4d3fe32495088805bbbb4f1de63e947-Paper-Conference.pdf